Left-ventricular ejection fraction (LVEF) is an important indicator of heart failure. Existing methods for LVEF estimation from video require large amounts of annotated data to achieve high performance, e.g. using 10,030 labeled echocardiogram videos to achieve mean absolute error (MAE) of 4.10. Labeling these videos is time-consuming however and limits potential downstream applications to other heart diseases. This paper presents the first semi-supervised approach for LVEF prediction. Unlike general video prediction tasks, LVEF prediction is specifically related to changes in the left ventricle (LV) in echocardiogram videos. By incorporating knowledge learned from predicting LV segmentations into LVEF regression, we can provide additional context to the model for better predictions. To this end, we propose a novel Cyclical Self-Supervision (CSS) method for learning video-based LV segmentation, which is motivated by the observation that the heartbeat is a cyclical process with temporal repetition. Prediction masks from our segmentation model can then be used as additional input for LVEF regression to provide spatial context for the LV region. We also introduce teacher-student distillation to distill the information from LV segmentation masks into an end-to-end LVEF regression model that only requires video inputs. Results show our method outperforms alternative semi-supervised methods and can achieve MAE of 4.17, which is competitive with state-of-the-art supervised performance, using half the number of labels. Validation on an external dataset also shows improved generalization ability from using our method. Our code is available at https://github.com/xmed-lab/CSS-SemiVideo.
translated by 谷歌翻译
完全有监督的语义细分从密集的口罩中学习,这需要封闭设置的大量注释成本。在本文中,我们使用自然语言作为监督,而无需任何像素级注释进行开放世界细分。我们将提出的框架称为FreeSeg,在该框架上可以从训练训练型模型的原始功能图中免费获得。与零射击或开放集分割相比,freeSeg不需要任何带注释的掩码,并且可以广泛预测超出类无需监督的分段之外的类别。具体而言,FreeSeg从图像文本相似性图(ITSM)中获得了可解释的对比度图像预处理(ICLIP)的自由掩码。我们的核心改进是浓密ICLIP的平滑最小池,具有部分标签和像素的分割策略。此外,没有复杂的设计,例如分组,聚类或检索,很简单。除了简单性外,Freeseg的表现超过了以前的最先进的边缘,例如在同一设置中,MIOU在MIOU上的13.4%。
translated by 谷歌翻译
对比性语言图像预训练(剪辑)通过随时可用的自然语言监督学习丰富的表示。它可以改善下游视觉任务的一般性能,包括但不限于零射击,长尾巴,细分,检索,标题和视频。但是,据我们所知,尚未研究剪辑的视觉解释性。为了提供其预测的视觉解释,我们提出了图像文本相似性图(ITSM)。基于它,我们出人意料地发现,剪辑比前景更喜欢背景区域,并且对人类理解提出了错误的可视化。在实验上,我们发现魔鬼在汇总部分,其中不适当的合并方法导致一种称为语义转移的现象。为了纠正和提高可视化结果,我们提出了蒙版的最大池,并使用自我监督图像编码器的注意力图。同时,解释性任务和识别任务需要不同的表示。为了解决这个问题,我们提出了双重预测,以满足这一要求。我们将上述方法整合为可解释的对比度图像预训练(ICLIP)。实验表明ICLIP极大地提高了可解释性。例如,在VOC 2012数据集中,非平凡的改进分别为$ 32.85 \%$和$ 49.10 \%$。
translated by 谷歌翻译
视频阴影检测旨在在视频帧之间产生一致的阴影预测。但是,当前的方法遇到了整个框架的阴影预测不一致的,尤其是当视频中的照明和背景纹理发生变化时。我们观察到不一致的预测是由阴影特征不一致引起的,即,同一阴影区域的特征在附近的框架之间显示出不同的礼节。在本文中,我们提出了一种新颖的阴影通信方法(SC-COR)(SC-COR) ),以增强跨帧的特定阴影区域的像素相似性,以进行视频阴影检测。我们提出的SC-COR具有三个主要优势。首先,不需要密集的像素到像素对应标签,SC-COR可以以弱监督的方式学习跨帧的像素对应。其次,SC-COR考虑了阴影内的可分离性,这对视频中的变体纹理和照明是可靠的。最后,SC-COR是一个插件模块,可以轻松地集成到没有额外的计算成本的情况下。我们进一步设计了一个新的评估指标,以评估视频阴影检测结果的时间稳定性。实验结果表明,SC-COR的表现优于先前的最新方法,而IOU的表现为6.51%,而新引入的时间稳定性度量为3.35%。
translated by 谷歌翻译
本文研究了几种皮肤疾病分类问题。基于至关重要的观察,即皮肤病图像通常存在于一类中的多个子群体(即,一类疾病中图像的外观变化并形成多个不同的子组),我们设计了一种新型的亚群集感知网络,即扫描,以提高准确性以稀有皮肤疾病诊断。由于几次学习的性能很大程度上取决于学习特征编码器的质量,因此指导扫描设计的主要原理是每个类的内在子簇表示学习,以便更好地描述特征分布。具体而言,扫描遵循双分支框架,第一个分支是学习范围的特征以区分不同的皮肤疾病,第二个分支是学习可以有效地将每个班级划分为几个组的特征,以保留子 - 每个类中的聚集结构。为了实现第二个分支的目标,我们提出了一个集群损失,可以通过无监督的聚类学习图像相似性。为了确保每个子集群中的样品来自同一类,我们进一步设计了纯度损失,以完善无监督的聚类结果。我们在两个公共数据集上评估了拟议方法,以进行几次皮肤疾病分类。实验结果验证了我们的框架在SD-198和DERM7PT数据集​​上优于其他最先进方法约为2%至4%。
translated by 谷歌翻译
从组织学图像开发AI辅助腺体分割方法对于自动癌症诊断和预后至关重要。但是,像素级注释的高成本阻碍了其对更广泛的疾病的应用。计算机视觉中现有的弱监督语义分割方法获得了腺体分割的退化结果,因为腺体数据集的特征和问题与一般对象数据集不同。我们观察到,与自然图像不同,组织学图像的关键问题是,在不同组织之间拥有阶级与形态同质性和低色对比的混淆。为此,我们提出了一种新颖的在线方法简单的示例采矿(OEEM),该方法鼓励网络专注于可靠的监督信号,而不是嘈杂的信号,因此减轻了伪掩模中不可避免的错误预测的影响。根据腺数据集的特征,我们为腺体分割设计了强大的框架。我们的结果分别超过了MIOU的许多完全监督的方法和弱监督的方法,用于腺体分割超过4.4%和6.04%。代码可从https://github.com/xmed-lab/oeem获得。
translated by 谷歌翻译
自我监督的学习在视力和NLP方面取得了巨大进展。最近,它也引起了人们对X射线,CT和MRI等各种医学成像方式的广泛关注。现有方法主要集中于构建新的借口自学任务,例如根据医学图像的属性进行重建,方向和掩盖识别。但是,并未完全利用公开可用的自我实施模型。在本文中,我们提出了一个强大而有效的自学框架,用于外科视频理解。我们的主要见解是将知识从大型通用数据集中培训的公开模型中提取知识,以促进对手术视频的自我监督学习。为此,我们首先引入了一种传承语义的培训计划,以获取我们的教师模型,该模型不仅包含了公开可用模型的语义,而且还可以为手术数据提供准确的知识。除了仅具有对比度学习的培训外,我们还引入了一个蒸馏目标,将丰富的学习信息从教师模型转移到手术数据上的自学学习。对两个手术期识别基准的广泛实验表明,我们的框架可以显着提高现有的自我监督学习方法的性能。值得注意的是,我们的框架在低DATA制度下表现出了令人信服的优势。我们的代码可在https://github.com/xmed-lab/distillingself上找到。
translated by 谷歌翻译
Deep hashing has been extensively utilized in massive image retrieval because of its efficiency and effectiveness. However, deep hashing models are vulnerable to adversarial examples, making it essential to develop adversarial defense methods for image retrieval. Existing solutions achieved limited defense performance because of using weak adversarial samples for training and lacking discriminative optimization objectives to learn robust features. In this paper, we present a min-max based Center-guided Adversarial Training, namely CgAT, to improve the robustness of deep hashing networks through worst adversarial examples. Specifically, we first formulate the center code as a semantically-discriminative representative of the input image content, which preserves the semantic similarity with positive samples and dissimilarity with negative examples. We prove that a mathematical formula can calculate the center code immediately. After obtaining the center codes in each optimization iteration of the deep hashing network, they are adopted to guide the adversarial training process. On the one hand, CgAT generates the worst adversarial examples as augmented data by maximizing the Hamming distance between the hash codes of the adversarial examples and the center codes. On the other hand, CgAT learns to mitigate the effects of adversarial samples by minimizing the Hamming distance to the center codes. Extensive experiments on the benchmark datasets demonstrate the effectiveness of our adversarial training algorithm in defending against adversarial attacks for deep hashing-based retrieval. Compared with the current state-of-the-art defense method, we significantly improve the defense performance by an average of 18.61%, 12.35%, and 11.56% on FLICKR-25K, NUS-WIDE, and MS-COCO, respectively.
translated by 谷歌翻译
Surgical phase recognition is a fundamental task in computer-assisted surgery systems. Most existing works are under the supervision of expensive and time-consuming full annotations, which require the surgeons to repeat watching videos to find the precise start and end time for a surgical phase. In this paper, we introduce timestamp supervision for surgical phase recognition to train the models with timestamp annotations, where the surgeons are asked to identify only a single timestamp within the temporal boundary of a phase. This annotation can significantly reduce the manual annotation cost compared to the full annotations. To make full use of such timestamp supervisions, we propose a novel method called uncertainty-aware temporal diffusion (UATD) to generate trustworthy pseudo labels for training. Our proposed UATD is motivated by the property of surgical videos, i.e., the phases are long events consisting of consecutive frames. To be specific, UATD diffuses the single labelled timestamp to its corresponding high confident ( i.e., low uncertainty) neighbour frames in an iterative way. Our study uncovers unique insights of surgical phase recognition with timestamp supervisions: 1) timestamp annotation can reduce 74% annotation time compared with the full annotation, and surgeons tend to annotate those timestamps near the middle of phases; 2) extensive experiments demonstrate that our method can achieve competitive results compared with full supervision methods, while reducing manual annotation cost; 3) less is more in surgical phase recognition, i.e., less but discriminative pseudo labels outperform full but containing ambiguous frames; 4) the proposed UATD can be used as a plug and play method to clean ambiguous labels near boundaries between phases, and improve the performance of the current surgical phase recognition methods.
translated by 谷歌翻译
图像回归任务,如骨矿物密度(BMD)估计和左心室喷射分数(LVEF)预测,在计算机辅助疾病评估中起重要作用。大多数深度回归方法用单一的回归损耗函数训练神经网络,如MSE或L1损耗。在本文中,我们提出了一种用于深度图像回归的第一个对比学习框架,即adacon,其包括通过新颖的自适应边缘对比损耗和回归预测分支的特征学习分支组成。我们的方法包含标签距离关系作为学习特征表示的一部分,这允许在下游回归任务中进行更好的性能。此外,它可以用作即插即用模块,以提高现有回归方法的性能。我们展示了adacon对来自X射线图像的骨矿物密度估计和来自超声心动图象的X射线图像和左心室喷射分数预测的骨矿物密度估计的有效性。 Adacon分别导致MAE在最先进的BMD估计和LVEF预测方法中相对提高3.3%和5.9%。
translated by 谷歌翻译